SQL Server 2012 : T-SQL Enhancements - The MERGE Statement (part 1)

7/13/2013 7:41:20 PM

The MERGE statement does just what its name says. It combines the normal insert, update, and delete operations involved in a typical merge scenario, along with the select operation that provides the source and target data for the merge. Essentially, that means it combines four statements into one. In fact, you can combine five statements into one using the OUTPUT clause, and even more than that with INSERT OVER DML (a special T-SQL feature syntax, which we cover next).

Prior to SQL Server 2008, separate, multiple statements were required to achieve what can now be accomplished with a single MERGE statement. This statement has a flexible syntax that allows you to exercise fine control over source and target matching, as well as the various set-based DML actions carried out on the target. The result is simpler code that’s easier to write and maintain (and also runs faster) than the equivalent code using separate statements to achieve the same result.

This first example uses MERGE to manage stocks and trades. Begin by creating the two tables to hold stocks that you own and daily trades that you make, as shown in Example 1.

Example 1. Creating the Stock and Trade tables.

CREATE TABLE Stock(Symbol varchar(10) PRIMARY KEY, Qty int CHECK (Qty > 0))
CREATE TABLE Trade(Symbol varchar(10) PRIMARY KEY, Delta int CHECK (Delta <> 0))

You start off with 10 shares of Adventure Works stock and 5 shares of Blue Yonder Airlines stock. These are stored in the Stock table:

INSERT INTO Stock VALUES ('ADVW', 10)
INSERT INTO Stock VALUES ('BYA', 5)

During the day, you conduct three trades. You buy 5 new shares for Adventure Works, sell 5 shares of Blue Yonder Airlines, and buy 3 shares for your new investment in Northwind Traders. These are stored in the Trade table, as follows:

INSERT INTO Trade VALUES('ADVW', 5)
INSERT INTO Trade VALUES('BYA', -5)
INSERT INTO Trade VALUES('NWT', 3)

Here are the contents of the two tables:

SELECT * FROM Stock
GO

Symbol     Qty
---------- -----------
ADVW       10
BYA        5

(2 row(s) affected)


SELECT * FROM Trade
GO

Symbol     Delta
---------- -----------
ADVW       5
BYA        -5
NWT        3

(3 row(s) affected)

At the closing of the day, you want to update the quantities in the Stock table to reflect the trades of the day you recorded in the Trade table. Your Adventure Works stock quantity has risen to 15, you no longer own any Blue Yonder Airlines (having sold the only 5 shares you owned), and you now own 3 new shares of Northwind Traders stock. That’s going to involve joining the Stock and Trade tables to detect changes in stock quantities resulting from your trades, as well as insert, update, and delete operations to apply those changes back to the Stock table. All this logic and manipulation can be performed with a single statement using MERGE, as shown in Example 2.

Example 2. Applying trades to stocks using MERGE.

MERGE Stock
 USING Trade
 ON Stock.Symbol = Trade.Symbol
 WHEN MATCHED AND (Stock.Qty + Trade.Delta = 0) THEN
   -- delete stock if entirely sold
   DELETE
 WHEN MATCHED THEN
   -- update stock quantity (delete takes precedence over update)
   UPDATE SET Stock.Qty += Trade.Delta
 WHEN NOT MATCHED BY TARGET THEN
   -- add newly purchased stock
  INSERT VALUES (Trade.Symbol, Trade.Delta);

Let’s dissect this statement. It begins of course with the MERGE keyword itself, followed by USING and ON keywords that respectively identify the target and source of the merge operation, and the joining keys used to relate the source and target to each other for the merge. Three merge clauses then follow with the WHEN…THEN syntax, and the statement is then finally terminated with a semicolon (;).

Important

The statement-terminating semicolon (part of the SQL standard) is usually unnecessary in SQL Server. However, the MERGE statement absolutely requires it, and you will receive an error if you omit it.

1. Defining the Merge Source and Target

MERGE has a very elegant implementation in SQL Server. At its core, it operates on a join between the source and target of the merge no differently than the way any standard JOIN predicate in the FROM clause of any SELECT statement works. As you’ll see shortly when you examine SQL Server’s query plan for MERGE, the source, the target, and the join between them are handled internally in exactly the same manner as for a regular SELECT. The parts of the MERGE syntax that express this select operation include the MERGE keyword itself, along with USING and ON, which respectively specify the target, source, and join predicate, as shown in this snippet from Example 2:

MERGE Stock
 USING Trade
 ON Stock.Symbol = Trade.Symbol

The target can be any table or updateable view and is specified immediately following the MERGE keyword. It is the recipient of the changes resulting from the merge, which can include combinations of new, changed, and deleted rows. In this example, the Stock table is the target, and it receives changes merged into it from daily trade information (the source).

The source is the provider of the data, which is the Trade table in this example, and is specified with the USING keyword right after the target. The source can be virtually anything. This includes not only regular tables and views, but subqueries, CTEs, table variables, remote tables, table-valued functions (TVFs) and TVPs alike, and even text files accessed with OPENROWSETBYTES. Fundamentally, anything that is valid in the FROM clause of a SELECT statement is valid as the source of a MERGE statement—nothing more, nothing less.

The join predicate specified by the ON keyword that follows defines the column key or keys relating the source and target to each other, no differently than a standard table join. Again, anything you can put in a SELECT join can be specified for a MERGE join with ON. The join defines which records are considered matching or nonmatching between the source and target. In this example, source and target tables are related by the Symbol column. The join predicate tells SQL Server what stocks exist and don’t exist in both tables so that you can insert, update, and delete data in the target table accordingly. The type of join (inner, left outer, right outer, or full outer) is determined by which of the various merge clauses are then applied next in the MERGE statement.

2. The WHEN MATCHED Clause

The previous example uses three merge clauses: two WHEN MATCHED clauses and one WHEN NOT MATCHED BY TARGET clause. Let’s look at each of them closely.

The first WHEN MATCHED clause executes when a matching stock symbol is found in both the Stock and the Trade tables, as shown in this code snippet:

WHEN MATCHED AND (Stock.Qty + Trade.Delta = 0) THEN
   -- delete stock if entirely sold
   DELETE

A match would normally mean updating the quantity value in the Stock table by the delta value (amount bought or sold) in the Trade table. However, in this scenario, you want to physically delete the row in the Stock table if its updated value results in 0, because that means you don’t really own that particular stock at all anymore (as is the case with Blue Yonder Airlines). You can code for that scenario by qualifying the WHEN MATCHED clause with an additional predicate that tests whether the stock quantity resulting from the trade yields 0. This gives you flexibility to provide your own criteria as predicates to your merge clauses, and apply custom business logic as filters to the various matching conditions. In this particular case, you want to remove a row from the Stock table using the DELETE statement rather than changing its value to 0.

The next merge clause is another WHEN MATCHED clause, but this second one has no predicate qualifying the match condition, as shown in this code snippet:

WHEN MATCHED THEN
   -- update stock quantity (delete takes precedence over update)
   UPDATE SET Stock.Qty += Trade.Delta

This second clause handles all the other trades of preexisting stock that have not resulted in 0 and changes the stock quantity accordingly using the UPDATE statement (that is, the Stock.Qty values will go up or down depending on the positive or negative number in Trade.Delta). In this example, you want the Adventure Works stock quantity to go up to 15, reflecting the 5 shares purchased on top of the 10 you already owned.

Note

An error would occur if you tried to sell more than you owned. In fact, an error would also occur if you tried to sell everything that you owned without first catching that condition by deleting the stock in the earlier merge clause. That’s because a check constraint on the Qty column (defined when you created the Stock table at the beginning of the example) instructs the database not to tolerate any zero or negative Qty values.

SQL Server has very particular rules governing the use of multiple merge clauses. You are permitted to have one or two WHEN MATCHED clauses—but no more. If there are two WHEN MATCHED clauses, the first one must be qualified with an AND condition, as this example has shown. Furthermore, one clause must specify an UPDATE, and the other must specify a DELETE. As demonstrated, MERGE will choose one of the two WHEN MATCHED clauses to execute based on whether the AND condition evaluates to true for any given row.

3. The WHEN NOT MATCHED BY TARGET Clause

The last merge clause is WHEN NOT MATCHED BY TARGET, as shown in this snippet:

WHEN NOT MATCHED BY TARGET THEN
   -- add newly owned stock
  INSERT VALUES (Trade.Symbol, Trade.Delta);

This clause handles rows found in the source but not in the target. This refers to stocks that are being traded for the first time, which is the new Northwind Traders stock that doesn’t yet exist in the target Stock table. The clause has no predicate (although it could), and so there are no additional conditions for the clause. Here, you simply add the new data found in the Trade table to the Stock table using the INSERT statement.

Note

The BY TARGET keywords are optional for this clause. WHEN NOT MATCHED is equivalent to WHEN NOT MATCHED BY TARGET.

Only one WHEN NOT MATCHED BY TARGET clause is permitted in a single MERGE statement. It can be qualified with an AND condition, as you saw earlier with WHEN MATCHED. (There is no purpose for an AND condition on the WHEN NOT MATCHED BY TARGET clause in this example.)

After executing the MERGE statement in Example 2, the Stock table is updated to reflect all the trades of the day merged into it, as shown here:

SELECT * FROM Stock
GO

Symbol     Qty
---------- -----------
ADVW       15
NWT        3

(2 row(s) affected)

Just as desired and expected, Blue Yonder Airlines is gone, Northwind Traders has been added, and Adventure Works has been updated. This is a rather impressive result for just one statement! It took less code to write and will take less effort to maintain than the equivalent operations written as separate statements would, and it also runs faster because it is compiled and executed as a single statement. No additional overhead is incurred for the simple reason that this statement operates on the same fundamental principles as your basic SELECT statement’s FROM and JOIN clauses.

4. Using MERGE for Table Replication

Let’s move on to another example that shows how MERGE can be used as a tool for achieving simple replication between two tables. First, define the tables Original and Replica with identical schemas. Then create a stored procedure with a MERGE statement that replicates changes made in the Original table over to the Replica table, as shown in Example 3.

Example 3. Creating two tables and a stored procedure that uses MERGE to synchronize them.

CREATE TABLE Original(PK int primary key, FName varchar(10), Number int)
CREATE TABLE Replica(PK int primary key, FName varchar(10), Number int)
GO

CREATE PROCEDURE uspSyncReplica AS
 MERGE Replica AS R
  USING Original AS O ON O.PK = R.PK
  WHEN MATCHED AND (O.FName != R.FName OR O.Number != R.Number) THEN
    UPDATE SET R.FName = O.FName, R.Number = O.Number
  WHEN NOT MATCHED THEN
    INSERT VALUES(O.PK, O.FName, O.Number)
  WHEN NOT MATCHED BY SOURCE THEN
    DELETE;

The MERGE statement in this stored procedure handles the replication task by joining the two tables on their primary keys (PK) and providing three merge clauses. The first clause processes updates, as shown here:

WHEN MATCHED AND (O.FName != R.FName OR O.Number != R.Number) THEN
  UPDATE SET R.FName = O.FName, R.Number = O.Number

Here, WHEN MATCHED is used to find all the records that exist in both the original and the replica, and then the UPDATE statement updates the matching rows on the replica side with the latest original data. Performing such an update when no data has actually changed is wasteful, so the predicate qualifies the merge clause to apply only when a row change is detected in any of the nonkey values between the original and the replica.

The second merge clause handles insertions, as follows:

WHEN NOT MATCHED THEN
  INSERT VALUES(O.PK, O.FName, O.Number)

As mentioned earlier, WHEN NOT MATCHED is equivalent to WHEN NOT MATCHED BY TARGET, which returns all the original rows not found in the replica table. These records represent new rows added to the original table since the last merge, which are now added to the replica as well using the INSERT statement in this clause.